Instructions

Create a web page presentation using R Markdown that features a plot created with Plotly. Host your webpage on either GitHub Pages, RPubs, or NeoCities. Your webpage must contain the date that you created the document, and it must contain a plot created with Plotly.

Loading the data

Here I am, again, loading the “Cyclistic” dataset provided by Google. It is 12 separate CSVs, one for each month, containing geocoded data for bike-share usage in Chicago. For the Plotly visualizations, I will be using the ggplotly() function as it is easier to work with ggplot2.

librarian::shelf(dplyr, purrr, data.table, lubridate, ggplot2, plotly)
## 
##   The 'cran_repo' argument in shelf() was not set, so it will use
##   cran_repo = 'https://cran.r-project.org' by default.
## 
##   To avoid this message, set the 'cran_repo' argument to a CRAN
##   mirror URL (see https://cran.r-project.org/mirrors.html) or set
##   'quiet = TRUE'.
mydata <- do.call(rbind, lapply(list.files(pattern="*.csv"), fread))


mydata <- mydata %>% 
      select(-c(ride_id, start_station_name, end_station_name)) %>%
      mutate(ride_length = as.numeric(ended_at-started_at),
             ended_at = ymd_hms(ended_at),
             started_at = ymd_hms(started_at),
             day = factor(weekdays(started_at), c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday", "Sunday")),
             member_casual = as.factor(stringr::str_to_title(member_casual)))

mydata$ride_length <- replace(mydata$ride_length, which(mydata$ride_length < 0), NA)

summary(mydata)
##  rideable_type        started_at                     ended_at                  
##  Length:4073561     Min.   :2020-06-03 05:59:59   Min.   :2020-06-03 06:03:37  
##  Class :character   1st Qu.:2020-08-07 19:09:29   1st Qu.:2020-08-07 19:39:10  
##  Mode  :character   Median :2020-09-30 07:36:28   Median :2020-09-30 07:51:42  
##                     Mean   :2020-11-08 09:07:38   Mean   :2020-11-08 09:31:51  
##                     3rd Qu.:2021-03-13 10:03:09   3rd Qu.:2021-03-13 10:22:00  
##                     Max.   :2021-05-31 23:59:16   Max.   :2021-06-10 22:17:11  
##                                                                                
##  start_station_id   end_station_id       start_lat       start_lng     
##  Length:4073561     Length:4073561     Min.   :41.64   Min.   :-87.87  
##  Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
##  Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
##                                        Mean   :41.90   Mean   :-87.64  
##                                        3rd Qu.:41.93   3rd Qu.:-87.63  
##                                        Max.   :42.08   Max.   :-87.52  
##                                                                        
##     end_lat         end_lng       member_casual     ride_length     
##  Min.   :41.54   Min.   :-88.07   Casual:1713356   Min.   :      0  
##  1st Qu.:41.88   1st Qu.:-87.66   Member:2360205   1st Qu.:    462  
##  Median :41.90   Median :-87.64                    Median :    843  
##  Mean   :41.90   Mean   :-87.64                    Mean   :   1617  
##  3rd Qu.:41.93   3rd Qu.:-87.63                    3rd Qu.:   1554  
##  Max.   :42.16   Max.   :-87.44                    Max.   :3257001  
##  NA's   :5037    NA's   :5037                      NA's   :10336    
##         day        
##  Monday   :504915  
##  Tuesday  :505455  
##  Wednesday:531021  
##  Thursday :534848  
##  Friday   :599049  
##  Saturday :758984  
##  Sunday   :639289

First Plot: Ride_length

This graph shows a histogram of ride length in seconds with respect to the member category. As you can see, non-members (“casual”) tend to ride for a longer length of time.

ggplotly(
   ggplot(data = mydata) +
      geom_histogram(aes(x=ride_length, fill = member_casual))+
      xlim(0,5000) +
      labs(fill = "Casual User or Member?") +
      xlab("Ride Length (s)") +
      ggtitle(("Histogram of Ride Length by Member Status")) +
      theme_bw()
)

Second Plot: Day of the week

This graph shows the dsitribution of the average # of rides per day of the week with respect to the member category. As you can see, member bike usage is less dependent of the day, while non-member usage is focused on the weekends.

ggplotly(
   ggplot(mydata) +
      geom_bar(aes(x=day, fill = member_casual), position = 'dodge')+
      labs(fill = "Casual User or Member?") +            
      xlab("Day of the Week") +
      ggtitle(("Frequencies of Weekday Usage by Member Status")) +
      theme_bw()
)